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Optimal utilization of acoustic cues during auditory categorization is a vital skill, particularly 
when informative cues become occluded or degraded. Consequently, the acoustic 
environment requires flexible choosing and switching amongst available cues. The 
present study targets the brain functions underlying such changes in cue utilization. 
Participants performed a categorization task with immediate feedback on acoustic 
stimuli from two categories that varied in duration and spectral properties, while we 
simultaneously recorded Blood Oxygenation Level Dependent (BOLD) responses in fMRI 
and electroencephalograms (EEGs). In the first half of the experiment, categories could 
be best discriminated by spectral properties. Halfway through the experiment, spectral 
degradation rendered the stimulus duration the more informative cue. Behaviorally, 
degradation decreased the likelihood of utilizing spectral cues. Spectrally degrading the 
acoustic signal led to increased alpha power compared to nondegraded stimuli. The 
EEG-informed fMRI analyses revealed that alpha power correlated with BOLD changes 
in inferior parietal cortex and right posterior superior temporal gyrus (including planum 
temporale). In both areas, spectral degradation led to a weaker coupling of BOLD response 
to behavioral utilization of the spectral cue. These data provide converging evidence from 
behavioral modeling, electrophysiology, and hemodynamics that (a) increased alpha power 
mediates the inhibition of uninformative (here spectral) stimulus features, and that (b) the 
parietal attention network supports optimal cue utilization in auditory categorization. The 
results highlight the complex cortical processing of auditory categorization under realistic 
listening challenges. 
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INTRODUCTION 

The interpretation of acoustic signals is an essential human 
skill for goal-directed behavior and vocal communication. The 
core process underlying this skill — auditory categorization — has 
been shown to be highly flexible and adaptive, and allows, 
for instance, speaker recognition in a cocktail party situation 
(Zion Golumbic et al., 2013), or speech comprehension in 
noise (Nahum et al., 2008). In both cases, attention has to be 
directed to the most informative aspect of the acoustic signal 
(Hill and Miller, 2010). 

Neurophysiological studies have suggested that the relative 
weighting of information during categorization {information gain 
or cue weighting, cf Holt and Lotto, 2006) may be subserved 
by the interplay between excitatory and inhibitory mechanisms 
(Thut et aL, 2006; Rihs et al., 2007; Weissman et al., 2009). 
One promising neurophysiological marker of functional inhibi- 
tion processes are brain oscillations recorded using electroen- 
cephalography (EEG), predominantly in the alpha frequency 
range (8-13 Hz, Foxe et al., 1998; Foxe and Snyder, 2011; Weisz 
et al., 2011, 2013; Klimesch, 2012). Initially, alpha power had 
been interpreted as reflecting the degree to which primary cor- 
tical areas are in an "idling" mode (Adrian and Matthews, 1934; 
Niedermeyer and Silva, 2005). More recent studies on auditory 



comprehension, on the other hand, have shown that the pro- 
cessing of degraded speech stimuli is accompanied by relative 
decreases in alpha power suppression, i.e., relative increases in 
alpha power (Obleser and Weisz, 2012; Becker et al., 2013). One 
interpretation of this finding is that relative increases in alpha 
power index greater attention and working memory demands 
under degradation (Ronnberg et al., 2008; Wild et al., 2012). 
It has been further proposed that brain regions showing high 
alpha power undergo inhibition, which in turn allows enhanced 
processing of task-relevant information (Klimesch et al., 2007). 

Brain areas underlying the processing and categorization of 
acoustic information have been identified by means of func- 
tional magnetic resonance imaging (fMRI). Previous studies have 
shown that the posterior part of the superior temporal gyrus 
(pSTG) is crucially involved in auditory categorization and dis- 
crimination (Hall et al., 2002; Guenther et al., 2004; Husain et al., 
2006; Desai et al., 2008; Bermudez et al., 2009; Sharda and Singh, 
2012). Importantly, in most of these studies, auditory categoriza- 
tion was also subserved by the planum temporale (PT) in the 
pSTG. The PT has recently received particular attention, because 
it does not only play a general role in auditory categorization 
(Griffiths and Warren, 2002; Husain et al., 2006; Obleser and 
Eisner, 2009) but also a more specific one with regard to the 
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processing of spectral information and pitch (Hall and Plack, 
2009; Alho et al., 2014). 

Furthermore, feature-selective attentional processes play a cru- 
cial role in categorization. Studies concerned with aspects of 
selective attention during categorization have mainly focused 
on the visual system (Yantis, 1993; Posner and Dehaene, 1994; 
Corbetta et al, 2000; Yantis, 2008). These studies identified the 
inferior parietal lobule (IPL) as an important, hub-like structure, 
being involved when participants focus attention on informative 
stimulus features (Shaywitz et al., 2001; Behrmann et al., 2004; 
Geng and Mangun, 2009; Salmi et al., 2009; Schultz and Lennert, 
2009; Gillebert et al, 2012). Existing research on attention in 
audition has further provided evidence for the involvement of 
the parietal network (Rinne et al, 2007; Salmi et al, 2009; Hill 
and Miller, 2010; Henry et al., 2013). In addition, a recent struc- 
tural imaging (voxel-based morphometry) study also highlighted 
the role of the IPL in categorization processes (Scharinger et al, 
2014). 

More recently, the possibility to combine recordings of 
EEG oscillatory activity and fMRI Blood Oxygenation Level 
Dependent (BOLD) activity has been explored in several imag- 
ing studies. Simultaneous EEG-fMRI recordings (Ritter and 
Villringer, 2006; Sadaghiani et al., 2010, 2012) suggest that alpha 
power can be negatively (Goldman et al., 2002; Laufs et al, 2003; 
Ritter and Villringer, 2006) or positively (Moosmann et al., 2003; 
Liu et al., 2012) correlated with brain metabolism, depending on 
the brain regions these correlations are observed in. However, 
multi-modal neuroimaging evidence on auditory cue weighting 
during categorization has been essentially absent. Most studies 
concerned with a functional coupling of alpha power and BOLD 
signal in selective attention tasks compared the processing of 
task-relevant information with the processing of task- irrelevant 
distractor information (e.g., Scheeringa et al., 2012). 

It is thus less clear how multiple, potentially competing cues 
provided by the same acoustic stimulus, will be reflected in alpha- 
tuned functional processes and concomitant BOLD change. To 
this end, we designed two stimulus sets for auditory categoriza- 
tion. In the first stimulus set, categorization could be based on 
spectral properties or physical duration, with spectral properties 
being more informative. In the second stimulus set, sound dura- 
tion became the more informative cue, while spectral properties 
could still be used for categorization. Using combined EEG/fMRI, 
we asked (a) whether auditory categorization yields a behavioral 
preference for the most informative stimulus cue in each condi- 
tion; (b) which brain areas support change in cue utilization, (c) 
whether alpha power shows relative increases under degradation 
and (d) whether alpha power correlates with BOLD in brain areas 
dedicated to the processing of acoustic cues. 

MATERIALS AND METHODS 
PARTICIPANTS 

Sixteen healthy volunteers were recruited from the participant 
database of the Max Planck Institute for Human Cognitive and 
Brain Sciences (7 females, age range 20-29 years, age 25 ± 2.7 
years mean ± standard deviation). They were all right-handed, 
native speakers of German with no self-reported hearing impair- 
ments or neurological disorders. Due to technical problems with 



EEG acquisition in the magnetic resonance (MR) scanner, we 
had to exclude one participant from further analyses. Participants 
gave written informed consent and received financial compensa- 
tion for their participation. All procedures followed the guidelines 
of the local ethics committee (University of Leipzig) and were in 
accordance with the Declaration of Helsinki. 

STIMULI 

Stimuli were based on spectral and durational modifications of 
an inharmonic base signal. This base signal was constructed by 
adding 16 exponentially spaced sinusoids (ratio between suc- 
cessive components: 1.15) to the lowest sinusoid component 
frequency of 500 Hz (Goudbeek et al, 2009; Scharinger et al, 
2014). We modified the spectral properties of individual sounds 
by applying a band-pass filter with a single frequency peak, using 
a second order infinite impulse response (IIR) filter with a band- 
width corresponding to a fifth of its frequency peak. The term 
"spectral peak" is henceforth used to refer to the filters' center 
frequency, which also describes the resulting spectral properties. 
Duration modifications were based on differences in the length of 
the sounds. 

Individual members of category distributions, arbitrarily 
labeled "A" and "B," varied on the basis of spectral peak and 
duration: For individual sounds of each category, spectral filter 
frequencies and durations were randomly drawn from bivariate 
normal distributions. These distributions, with equal standard 
deviations, cr, differed in their means, [l, between the two cat- 
egories, A and B (Table 1). Thus, each individual sound was 
characterized by the two dimensions, duration and spectral peak, 
with means of duration and spectral peak differing between 
the two category distributions. Each category distribution con- 
sisted of 1000 sound exemplars from which a random sample 
was drawn for each participant in the experiment. Following 
Smits et al. (2006), we converted spectral peak frequency and 
duration to scales that allowed for psychoacoustic comparability. 
Consequently, frequencies were converted to the equivalent rect- 
angular bandwidth (ERB) scale that approximates the bandwidths 
of the auditory filters in human hearing (Glasberg and Moore, 
1990), and durations were converted to a logarithmic scale (DUR; 
cf. Smits et al., 2006). Table 1 illustrates the means (spectral peak 
and durations) of the category distributions in psychophysical 
and physical units. 

In the first half of the experiment (nondegraded condition), 
the two stimulus distributions did not overlap in their spec- 
tral peak, but i of the sounds in category A and B overlapped 
in duration (Figure lA top). This set-up aimed at biasing par- 
ticipants to focus on spectral cues while sound duration may 
serve as secondary cue. In the second half of the experiment 
{degraded condition), spectral cues were modified by applying 
four-band noise vocoding to the original stimulus distributions 
(DruUman et al., 1994; Shannon et al., 1995). Noise vocoding was 
done by dividing the original signal into four frequency bands, 
extracting the amplitude envelope from each band and reapply- 
ing it to bandpass-filtered noise carriers with matched cut-off 
frequencies. Envelopes were extracted using a zero-phase, 4th- 
order Butterworth low-pass filter; the low-pass filter cutoff was 
set at 256 Hz. Scaling for equal root mean square (RMS) energy 
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FIGURE 1 I Stimulus characteristics and behavioral results. (A) Top: 
Complex sounds differing in spectral peak (expressed in ERB; y-axis) and 
duration (expressed in DUR; x-axis). Distributions are indicated by ellipses, 
with black dots illustrating distributions for a representative participant. 
Bottom: Stimulus wave form and spectrogram illustrate the complex 
structure of sounds in the nondegraded condition (left) and the spectral 
smearing as a result of vocoding in the degraded condition (right). Duration 
and amplitude envelope were unaffected by degradation. (B) Results of 
behavioral discrimination. Top: Perceptual sensitivity (d') overtime, obtained 
from sliding windows over nondegraded and degraded trials per participant 
(window size = 20 trials, step size = 1 trial). Bottom: Comparison between 
by-subject cue indices in the nondegraded and degraded conditions. Mean 
cue index values for the nondegraded and the degraded conditions are 
connected for each participant. 



Table 1 | Means and standard deviations (in parentheses) of spectral 
peak and duration distributions for stimulus categories A and B in 
the nondegraded and degraded conditions (psychophysical and 
physical units). 

Stimulus category Nondegraded Degraded 

A B A B 



Spectral peak (ERB) 20.00(0.31) 17.00(0.31) 16.80(0.31) 15.50(0.31) 

Spectral peak (Hz) 1739(8) 1196(8) 1166(8) 984(8) 

Duration (DUR) 47.70(1.31) 52.53(1.31) 47.70(1.31) 52.53(1.31) 

Duration(ms) 118(1.14) 191(1.14) 118(1.14) 191(1.14) 



was performed channel-wise for each channel envelope (Rosen 
et al., 1999; Erb et al., 2012). We chose four-band noise vocoding 
because it offers a well-established reduction of spectrally-based 
intelligibility (cf Scott et al, 2006; Obleser and Kotz, 2010; 
Obleser et al., 2012), thereby ensuring comparability to studies on 
alpha power suppression in speech, while simultaneously being an 
ecologically valid modification by simulating effects of cochlear 
implants (Poissant et al., 2006). 

Noise vocoding led to a smearing of spectral detail, while 
amplitude envelope features and original stimulus duration 
remained unaffected (Figure lA, bottom). Thus, as demonstrated 
before (Scharinger et al., 2014), we aimed at inducing a change in 
acoustic cue utilization, from spectral peak in the first (nonde- 
graded) condition, to stimulus duration in the second (degraded) 
condition of the experiment. The stimulus degradation in the 
second half of the experiment therefore targeted the spectral 
properties (i.e., spectral peak, but also affected other spectral 
features such as harmonicity). Thus, degradation of the initially 
informative spectral cue ought to decrease participants' reliance 
on that cue and prompt a relatively increased reliance on the 
duration cue. 

All stimuli were normalized for equal root-mean-square inten- 
sity and presented at ~60 dB SPL. Onset and offset ramps (5 ms) 
ensured that acoustic artifacts were minimized. 

EXPERIMENTAL PROCEDURE 

Participants were first familiarized with the categorization task in 
the scanner and had to complete a short practice run consist- 
ing of 20 sounds (10 from category A and 10 from category B) 
that did not occur in the main experiment. The subsequent 
main experiment was arranged in four runs: Two initial runs 
with nondegraded sounds, and two subsequent runs with spec- 
trally degraded sounds (Figure lA, top). In each run, 60 sound 
exemplars, randomly drawn from categories A and B with equal 
probability, were presented in a sparse imaging design in the MR 
scanner (Hall et al., 1999). The sparse design was chosen in order 
to guarantee that stimuli could be presented during silent periods 
in-between the acquisition of echo-planar images (EPI). At the 
same time, this design reduced contamination of the EEG signal 
by gradient switches during volume acquisition. 

On each trial, one acoustic stimulus was presented on aver- 
age 2 s after the offset of a preceding EPI sequence (±500 ms). 
Subsequently, a visual response prompt (green traffic light) was 



presented on a screen which participants viewed through a mir- 
ror 3 s after stimulus onset. Participants were then required 
to indicate whether the presented sound belonged to category 
A or category B by pressing one of two keys on a button 
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box. Button assignment was counterbalanced across participants. 
Following the response, participants received corrective feedback 
(Correct/Incorrect), which was displayed for 1 s in the middle of 
the screen. Five seconds after the onset of an acoustic stimulus, a 
subsequent EPI volume (acquisition time TA = 2 s) was acquired, 
such that the BOLD peak would best capture stimulus process- 
ing. At random positions within each run, 15 silent trials (=20% 
of all trials) without required responses served as baseline. The 
duration of the entire experiment with short breaks between runs 
was 50 min. 

ACQUISITION AND PRE-PROCESSING OF EEC DATA 

The continuous EEG was recorded inside the MR-scanner from 
3 1 Ag-AgCl electrodes mounted on an elastic cap according to the 
10-20 standard system (EasyCap-MR, Brain Products, Munich, 
Germany). The electrocardiogram (EGG) was registered with an 
additional electrode on the sternum. EEG signals were ampli- 
fied with an MR-conform 32-channel amplifier (BrainAmp MR; 
Brain Products, Munich, Germany) that did not get saturated by 
MR activity. Signals were recorded at a sampling frequency of 
5000 Hz and a resolution of 16 bits, referenced against FGz, using 
the BrainVision Recorder Software (Brain Products, Munich, 
Germany). The ground electrode was positioned between Fz and 
FPz. All impedances were kept below 5 kf2. 

Since we used a sparse imaging design with stimuli being pre- 
sented in-between two consecutive volume acquisitions, gradient 
artifact removal from the EEG was not necessary (cf. Herrmann 
and Debener, 2008; Huster et al., 2012). For preprocessing, a 
finite impulse response (FIR) 100 Hz low-pass filter (389 points, 
Hamming window) and a 1.7 Hz high-pass filter (4901 points, 
Hann window, corresponding to a cut-off period of 1/1.7 Hz = 
588 ms) was applied to the raw data. Note that filter settings were 
chosen such that smearing of gradient artifacts into time windows 
of interest were prohibited. Subsequently, filtered EEG data were 
down-sampled to 500 Hz and subjected to an independent com- 
ponents analysis (ICA) for artifact correction, using the routines 
provided by EEGLab (Delorme and Makeig, 2004) and field- 
trip (Oostenveld et al, 2011) within MATLAB 7.9 (MathWorks, 
Natick, MA). Note that the EGG channel was removed prior 
to ICA analysis. IGAs were calculated on 3-s epochs, with 1 s 
before and 2 s after stimulus onset. The separation of IGA com- 
ponents (total: 29) representing artifacts from those representing 
physiological EEG activity was done by visual inspection of the 
components' time-courses, topographies, and frequency spectra 
(cf. Debener et al., 2010), using custom-made fieldtrip scripts. 
Components either showing similar dynamics as the ECG chan- 
nel or resembling electroocculogram activity as illustrated in 
Debener et al. (2010) were considered artifacts. Note that it 
has been observed that ICA-based correction of cardio-ballistic 
artifacts performs better than standard artifact subtraction meth- 
ods (Debener et al, 2007; Jann et al., 2009). On average, 7 
components were therefore excluded (range: 5-9) by using the 
ICA-based artifact removal within fieldtrip (Oostenveld et al., 
2011). 

We furthermore identified bad EEG channels after artifact 
removal as channels exceeding a threshold of 150 |iV in more 
than 50% of all trials per participant. Bad channels (of which 



no participant showed more than 1) were interpolated by using 
signal information from the average of 4-5 neighboring channels 
(depending on channel location). 

In addition to EEG recordings inside the MR-scanner, we 
tested 18 different participants (9 females, mean age 25, range 20- 
31 years) outside the scanner. Presenting pre-recorded EPI sounds 
at times the scanner would have operated simulated the scan- 
ner noise. For this control group, the EEG was obtained from 64 
Ag-AgCl-electrodes (58 scalp electrodes, 2 mastoids, 2 electrodes 
for horizontal and 2 for vertical electrooculograms) on a Brain 
Vision EEG system (amplifier: BrainAmp, cap: BrainCap, Brain 
Products, Munich, Germany), arranged according to the extended 
10/20 system, (Oostenveld and Praamstra, 2001). Otherwise, 
stimulus presentation, EEG pre-processing and analyses were 
identical to the procedures described here. However, due to a 
technical problem with one participant, and more than 30% ICA- 
artifact components in two further participants, the resulting 
participant number of the control experiment was 15. This exper- 
iment served the purpose of testing the validity of the recordings 
obtained inside the scanner. Note, however, that overall magni- 
tude differences should not be compared between the experi- 
ments inside and outside the scanner, due to different recording 
equipment. 

ACQUISITION AND PRE-PROCESSING OFfMRI DATA 

Functional MRI data were recorded with a Siemens VERIO 3.0-T 
MRI scanner equipped with a 12-channel head coil, while par- 
ticipants performed the categorization task in supine position 
inside the scanner. Acoustic stimuli were transmitted through 
MR-compatible headphones (mr confon GmbH, Magdeburg, 
Germany). In-ear hearing protection (Hearsafe Technologies 
GmbH, Cologne, Germany) reduced scanner noise by approxi- 
mately 16 dB. 

Seventy-five whole-brain EPI volumes (30 axial slices, thick- 
ness = 3 mm, gap = 1 mm) in each of the 4 runs were 
collected every 9 s (TA = 2 s; TE = 30 ms; flip angle = 90°; 
field of view = 192 x 192 mm; voxel size = 3 x 3x 4 mm). 
High-resolution, 3D MP-RAGE Tl-weighted scans were used 
for localization and co-registration (acquired on a 3T Siemens 
TIM Trio scanner with a 12-channel head coil 29 months prior 
to the experiment, with the parameters: sagittal slices = 176, 
repetition time = 1300ms, TE = 3.46ms, flip angle = 10°, 
acquisition matrix = 256 x 240, voxel size = 1 x Ix 1mm). 
Voxel-displacement-maps for distortion correction (lezzard and 
Balaban, 1995; Hutton et al., 2002) were calculated on the basis of 
field maps (30 axial slices, thickness = 3 mm, gap = 1 mm, repe- 
tition time = 488 ms, TEl = 4.92 ms, TE2 = 7.38 ms, flip angle = 
60°, field of view = 192 x 192 mm, voxel size =3x3x3 mm). 

Functional (T2* -weighted) and structural (Tl-weighted) 
images were processed using Statistical Parametric Mapping 
(SPM8; Wellcome Department of Imaging Neuroscience, 
Institute of Neurology, University College of London). Functional 
images were first realigned using the 6-parameter affine trans- 
formation in translational (x, y, and z) and rotational (pitch, 
roll, and yaw) directions to reduce individual movement artifacts 
(Ashburner and Good, 2003). Subsequently, a mean image of 
each run-based image series was used to estimate unwarping 
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parameters, and voxel-displacement-maps were used for cor- 
recting magnetic field deformations (Jezzard and Balaban, 
1995; Hutton et al., 2002). Participants' structural images were 
manually pre-aligned to a standardized EPI template (Ashburner 
and Friston, 2004) in MNI space, improving co-registration 
and normalization accuracy. Next, functional images were co- 
registered to the corresponding participants' structural images 
and normalized to MNI space. Functional images were then 
smoothed using an 8-mm fuU-width half-maximum Gaussian 
kernel and subsequently used for first-level general linear model 
(GLM) analyses. 

ANALYSIS OF BEHAVIORAL DATA 

Our behavioral dependent measures were overall performance and 
cue utilization. Overall performance was estimated by d, a mea- 
sure of perceptual sensitivity that is independent of response bias. 
Perceptual sensitivity, d', was calculated from proportions of hits 
and false alarms according to a one-interval design (Macmillan 
and Creelman, 2005), where hits were defined as "category-A" 
responses to category-A stimuli, and false alarms were defined as 
"category-A" responses to category-B stimuli. Perceptual sensitiv- 
ity was calculated separately for each experimental run (2 non- 
degraded, 2 degraded runs). In order to visualize performance 
over time, we additionally calculated d' values in sliding windows 
(size: 20 trials, step size: 1 trials), separately for the nonde- 
graded and the degraded condition, and with the exclusion of null 
trials. 

The measure of cue index quantified individual participants' 
cue utilization (spectral peak vs. physical duration) in the follow- 
ing way: First, for each condition, the likelihood of a category-A 
response was predicted from the stimulus' physical properties, 
spectral peak and duration, by means of logistic regressions. 
The slope of the regressions function, expressed by absolute 
P, indicated the degree to which the corresponding physical stim- 
ulus property influenced the categorical response (Pspectral peak; 
Pduration; Goudbeek et al., 2009; Scharinger et al., 2013). Note that 
Pspectral peak and ^duration were estimated simultaneously. Second, 
the normalized difference between these p values [cue index) indi- 
cated participants' preference to rely on spectral peak (negative 
values according) or on duration (positive values). 

^ . , ^duration — f^spectral peak 

Cue index = — 

Pduration + Pspectral peak 

ANALYSIS OF EEG DATA 

For the analysis of the event-related potentials (ERPs), single-trial 
EEG epochs were first re-referenced to linked mastoids (approx- 
imated by channels Tp9 and TplO). Subsequently, epochs were 
filtered with a 20-Hz Butterworth low-pass filter and re-defined to 
include a pre-stimulus interval of 500 ms and a post-onset inter- 
val of 1500 ms. Baseline correction was applied by subtracting the 
mean amplitude of the —500 to 0 ms baseline interval from the 
epoch. Single-trials were averaged separately for the nondegraded 
and the degraded condition. Auditory Nl components (Naatanen 
and Picton, 1987) were identified by visual inspection in a time 
window between 100 and 150 ms post onset. Averaged amplitudes 



for Cz within the Nl time- window were compared between con- 
ditions (nondegraded, degraded) by means of dependent-samples 
f-tests. 

For time-frequency analyses, re-referenced EEG-data were 
down-sampled to 125 Hz and then decomposed with a Morlet 
wavelets analysis (Bertrand and Pantev, 1994), centered on win- 
dows that slid in steps of 10 ms along the temporal dimension 
(— 1 to 2 s). In the spectral dimension, we used 1-Hz bins from 1 
to 30 Hz. Wavelet widths ranged from 1 to 8 cycles, equally spaced 
over the 30 frequency bins. Time-frequency analyses were done 
separately for nondegraded and degraded trials. Mean power val- 
ues of a pre-stimulus baseline interval (—500 to —50 ms) were 
subtracted from the epoch. A time-frequency region of interest 
(ROI) was chosen according to the typical alpha-band interval 
(7-11 Hz) and according to epochs that previously showed the 
suppression effect in speech (400-700 ms post onset, e.g., Obleser 
and Weisz, 2012; Becker et al, 2013). A consistent and symmetric 
posterior electrode selection for subsequent EEG/fMRI correla- 
tions was based on electrodes where alpha power was strongest 
in above-mentioned ROI (within the nondegraded condition). 
These electrodes were: CPl, CP2, P7, P3, Pz, P4, P8, POz, Ol, Oz, 
and 02. Averaged power values in the alpha ROI was compared 
between conditions by means of dependent-samples t-tests. 

ANALYSIS OFfMRI DATA 

Activated voxels were identified using the GLM approach 
(Friston, 2004). At the first level, a GLM was estimated for 
each participant with a first-order finite impulse response (FIR; 
window = 2 s) and a high-pass filter with a cut-off of 128 s, 
representing standard settings for sparse imaging designs (cf. 
Peelle et al., 2010). The design matrix included regressors for 
sound trials (corresponding to volumes following sound repre- 
sentations), the mean-centered single-trial parametric modula- 
tor alpha power (obtained from the ROI defined above), and 
silent trials (corresponding to volumes following null trials). 
Experimental runs were included as regressors of no interest (one 
for each run). Six additional regressors of no-interest accounted 
for the realignment-induced spatial deformations of the EPI 
volumes. 

Resulting beta-maps were restricted to gray- and white mat- 
ter. This information was obtained from group-averages based 
on individual Tl -weighted scans. On the first level, the fol- 
lowing contrasts were calculated (separately for nondegraded 
and degraded conditions): sound trials against implicit baseline 
and parametric modulator alpha power against implicit base- 
line. Furthermore, we calculated the contrasts nondegraded > 
degraded and degraded > nondegraded. 

On the second level (group level), all contrasts were compared 
against zero using one-sample t-tests. Additionally, for each con- 
dition (nondegraded, degraded), sound-trial contrasts (against 
implicit baseline) from the first level were correlated with cue 
index using linear regression. Differences between nondegraded 
and degraded conditions in Cue index/BOLD correlation were 
assessed by testing the slopes of the linear regressions against each 
other using a dependent samples f-test. 

For statistical thresholding of second-level activations, we used 
a threshold oi p < 0.005 combined with a cluster extent of 15 
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voxels that corresponds to a whole-brain significance level of 
p < 0.05, as determined fi-om a MATLAB-implemented Monte 
Carlo simulation (Slotnick et al, 2003; Erb et al, 2013). 

In order to visualize BOLD modulation differences across 
conditions, ROIs of 10 mm radii were defined using the SPM 
toolbox MarsBaR (Brett et al., 2002). They were centered on the 
peak coordinates of significant clusters identified in the whole- 
brain analyses. For these regions, mean regression beta values 
were estimated for each participant. Note that no additional 
tests were conducted for these regions to avoid statistical circu- 
larity. Determination of anatomical locations was based on the 
Automated Anatomical Labeling Atlas (AAL; Tzourio-Mazoyer 
et al, 2002), and PT localization followed Westbury et al. (1999). 

RESULTS 
BEHAVIORAL DATA 

Participants performed above chance as indicated by d val- 
ues significantly greater than zero [mean d' = 1.51, SD = 0.43; 
f(]4) = 19.19, p < 0.01]. Participants' performance was charac- 
terized by a considerable improvement over the first twenty trials, 
as estimated from sliding-window averages of d'-values (window 
size: 20 trials, step size: 1 trial. Figure IB top). After degrada- 
tion was introduced, performance dropped to the initial level, 
but quickly regained a stable plateau and did not differ over- 
all from the nondegraded condition [nondegraded vs. degraded 
f(i4) = 1.00,p = 0.32]. 

Cue indices marginally differed between conditions 
[t(n) = 1.94, p = 0.07], with more negative values for the 
nondegraded than the degraded condition. This means that the 
tendency of utilizing spectral cues (i.e., a negative cue index) in 
the nondegraded condition decreased in the degraded condition 
(i.e., a positive-going cue index). However, a spectral strategy was 
never entirely given up, as judged from overall still negative cue 
indices in the degraded condition (Figure IB, bottom). 

EEG DATA 

TheNl (100-150 ms) of the ERP showed a typical central/midline 
topography (inside and outside the scanner). Nl mean amplitude 
marginally differed between the nondegraded and the degraded 
condition [f(i4) = 1.9, p = 0.08], with more negative values in 
the nondegraded than in the degraded condition. This effect 
reached significance outside the scanner [f(i4) = 7.89, p < 0.01; 
Figure 2A] . 

Alpha power (7-1 1 Hz) around 400-700 ms showed a central- 
posterior distribution and also differed significantly between 
conditions, with relatively higher alpha power for the degraded 
than for the nondegraded condition lt(u) = 2.06, p = 0.04 
Figure 2B] . Again, this effect also held for the control experiment 
outside the scanner [f(i4) = 2.56, p = 0.03; Figure 2C]. 

In order to assess the covariation of alpha power and cue index, 
we calculated correlations between mean alpha power and mean 
cue index per participant, and in addition, separately for the non- 
degraded and degraded condition. Overall, mean alpha power 
and mean cue index did not correlate significantly [r = 0.28, 
f(]4) = 1.07, p = 0.30]. This held both within the nondegraded 
[r = 0.23, t(i4) = 0.85, p = 0.41] and the degraded condition 
[r = 0.16, f(i4) = 0.60, p = 0.56]. 



fMRI DATA 

Overall auditory categorization network in parietal and temporal 
areas 

Results from group-level whole-brain analyses showed that the 
categorization of nondegraded and degraded sounds (compared 
to baseline) lead to activations in extensive bilateral temporo- 
parietal clusters, with peaks in inferior parietal lobule and post- 
central gyrus (see Figures). Furthermore, peaks in precentral 
and cingulate cortex were predominantly seen for nondegraded 
sounds, while degraded sounds showed activations in pSTG, PT, 
and Heschl's gyrus. Both conditions also revealed substantial acti- 
vations in middle frontal gyrus (MFC), inferior frontal gyrus 
(IFG), and in the dorsal medial nucleus of left Thalamus. 

More activation for degraded than for nondegraded sounds 
was found in right IFG (extending into the insula), left and right 
pSTG (including parts of PT, i.e., gray matter with a likelihood of 
25-45% being in PT according to Westbury et al, 1999), as well 
as right STG (extending into the insula). A detailed overview of 
the clusters is provided in Table 2. 

Alpha power covaries with BOLD activity in pSTG, PT, and IFG 

Group-level whole-brain analyses showed that single-trial alpha 
power correlated positively with BOLD only in the degraded 
condition. Here, alpha power/BOLD correlations occurred in 
two clusters in IFG (comprising pars triangularis and ventral 
orbito frontal cortex), in one cluster located in right pSTG (with 
25-45% probability of being in PT), and in one cluster in right 
angular gyrus. In the nondegraded condition, alpha power/BOLD 
correlations did not survive the statistical threshold. 

Stronger modulations of BOLD by alpha power could be 
observed in the orbital part of right IFG, as well as in bilateral 
pSTG, again comprising parts of the PT (with 25-45% probability 
according to Westbury et al., 1999; cf Table 3 and Figure 4A). 

Cue index modulates BOLD activity in parietal attention and 
temporal auditory network 

Group-level whole-brain regression analyses using the cue index 
showed positive correlations with BOLD in right MFG (anterior 
prefrontal cortex) only in the degraded condition. Here, a reduc- 
tion of using spectral cues corresponded to an increased BOLD 
signal in anterior prefrontal cortex. By contrast, cue index/BOLD 
correlations in the nondegraded condition did not survive the 
statistical threshold. 

Furthermore, positive cue index/BOLD correlations were 
stronger in the degraded than in the nondegraded condition in 
right dorso-lateral prefrontal cortex (covering parts of pars tri- 
angularis and pars opercularis), left pSTG/pSTS (extending into 
PT), left posterior MTG (involving parts in occipito-temporal 
cortex), right (ventral) IPL (involving parts of supramarginal 
gyrus and extending rostraUy into postcentral gyrus; cf Table 3 
and Figure 4B). 

DISCUSSION 

The two most important findings of this multimodal brain imag- 
ing study on auditory categorization are the following: First, 
auditory categorization of degraded stimuli yielded decreases in 
alpha power suppression (i.e., relative alpha power increases). 
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FIGURE 2 I EEG results. (A) Grand-average of evoked responses in tlie 
nondegraded (left) and degraded (rigfit) condition. ERP-differences between 
conditions were seen for the Nl , with a central/midline distribution 
(100-150 ms, indicated by gray bars). (B) Averaged time-frequency 
representations forthe nondegraded (left) and degraded (middle) condition, and 
difference between averages (degraded > nondegraded; right). The strongest 
effect of alpha suppression (compared to baseline) occurred at central-posterior 



electrodes (selection marked with black dots; 400-700 ms, 7-1 1 Hz), where it 
also significantly differed between conditions. (C) Averaged time-frequency 
representations from the control experiment outside the MR scanner 
(nondegraded: left, degraded: middle, difference: right). Differences and 
topographies are comparable to within-scanner recordings. Note that overall 
magnitude differences should not be compared between the experiments 
inside and outside the MR scanner, due to different recording equipment. 



which correlated with increased activation in right PT and IFG. 
Second, even though the behavioral measure of cue utilization 
only marginally differed between conditions, less reliance on 
spectral cues under sound degradation corresponded to increased 
activation in left PT and right IPL. In the subsequent sections, 
these findings will be discussed in more detail. 

ENHANCED ALPHA POWER DURING DEGRADED SPEECH PROCESSING 

In the current study, categorizing spectrally degraded sounds was 
accompanied by an attenuation of alpha power suppression. That 
is, relatively stronger alpha power was observed for the catego- 
rization of degraded as compared to nondegraded sounds. This 
reduction in alpha power suppression (relative to a pre-stimulus 
baseline) has previously been observed in comparing spectrally 
degraded speech stimuli to their nondegraded (intelligible) 



counter-parts (Obleser and Weisz, 2012; Becker et al., 2013). 
The current data thus extend previous findings by showing that 
increased alpha power under degradation is not restricted to 
speech material, but may reflect a more general process that 
has been interpreted before as enhanced "functional inhibition" 
(Jensen and Mazaheri, 2010), increased "idling" (Adrian and 
Matthews, 1934), or a more "active processing state" (Palva and 
Palva, 2011). 

A parsimonious interpretation of this effect relates to the func- 
tional inhibition hypothesis of increased alpha power (e.g., Jensen 
and Mazaheri, 2010). According to this approach, alpha power 
shows a relative decrease in areas subserving the processing of 
to-be-attended information (Thut et al., 2006), while it increases 
in areas subserving the processing of to-be-ignored information 
(Rihs et al., 2007). Thereby, alpha power dynamics instate a 
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FIGURE 3 I Regions of the souncls> baseline contrast in nondegraded 
(green) and degraded (red) condition (co-activation in nondegraded 
and degraded condition: yellow). Slices focus on the temporo-parietal 
network. Note that overlap/co-activation is shown as illustrative means and 
is not based on statistical measures. 



gain mechanism for neural information processing (Jokisch and 
Jensen, 2007; Kerlin et al., 2010). While the functional role of 
alpha oscillations in auditory processing and categorization has 
been examined much less often and only recently (Weisz et al., 
2011, 2013; Obleser and Weisz, 2012; Obleser et al, 2012; Becker 
et al., 2013), the interpretations provided by these previous stud- 
ies are in line with the functional inhibition hypothesis. For 
instance, it has been observed that alpha power suppression cor- 
relates with the intelligibility of auditory (speech) input (Obleser 
and Weisz, 2012; Becker et al, 2013). Alpha power suppression 
was attenuated when auditory stimuli were degraded, that is, 
when comprehension was more effortful and required higher 
demands on attention (Obleser et al., 2012), as has been suggested 
for effortful listening situations before (e.g., Shinn-Cunningham 
and Best, 2008; Wild et al, 2012). 

With respect to our data, we propose that alpha power 
increases gated the neural processing of acoustic information 
(duration vs. spectral peak) that differed in task-relevance 
between conditions: The introduction of spectral degradation in 
the second half of our experiment changed the relative informa- 
tiveness or task-relevance of the spectral and duration cues, with 
spectral peak becoming less informative than stimulus duration. 
It is thus possible that enhanced alpha under degradation indexed 
the inhibition of spectral information processing. 

Historically, however, enhanced alpha power has first been 
interpreted as reflecting the degree to which cortical areas are 
in an "idling" state (Adrian and Matthews, 1934; Niedermeyer 
and Silva, 2005). Consequently, reduction or suppression of alpha 
power was taken to index a departure from the idling mode 



Table 2 | Significant clusters obtained from whole-brain analyses 
(p < 0.005, extent threshold = 15) for the contrasts sounds > baseline 
in each condition, and the contrast degraded sounds > nondegraded 
sounds. 
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Extent 
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Abbreviations are explained in the text. Coordinates are given in Montreal 
Neurological Institute (MNI) space. 



toward a more attentive state. While this interpretation might be 
applicable for the general suppression of alpha power (vs. base- 
line) for nondegraded and degraded conditions, it cannot explain 
the differences in alpha power between conditions. That is, overall 
performance in our experiment (and thus presumably attentional 
effort) was comparable between the nondegraded and degraded 
conditions, while alpha power increased in the latter condition. 
Thus, this increase in alpha power is unlikely to reflect a more 
pronounced idling state. 
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Table 3 | Significant clusters obtained from whole-brain analyses 
(p < 0.005, extent threshold = 15) for the parametric modulators 
alpha and cue index, together with modulation differences between 
conditions. 
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Finally, it has been recently proposed that alpha power 
enhancement can also be indicative of active processing states 
(Palva and Palva, 2011). According to the "active processing 
hypothesis," enhanced alpha power underlies the coordination of 
neural processing in task-relevant cortical structures, particularly 
for higher-order attentional and executive functions. Since the 
participants in our experiment seemed to be reluctant to refrain 
from spectral cue utilization under degradation, enhanced alpha 
power may also relate to "listening" harder for spectral cues, i.e., 
to an active process of utilizing spectral cues despite their being 
less informative. Both the "functional inhibition" and "active pro- 
cessing" hypotheses can be applied to the cortical regions in which 
alpha power positively correlated with BOLD. 

SPECTRAL DEGRADATION AND THE PLANUM TEMPORALE 

In the degraded condition of our experiment, we observed posi- 
tive correlations of alpha power with BOLD activations in poste- 
rior STG and PT. The posterior STG and the PT have previously 
been suggested to subserve the processing of spectral information, 
and in particular, pitch and pitch changes (Zatorre et al., 1994; 
Zatorre and Belin, 2001; Schonwiesner et al, 2005; Hall and Plack, 
2009; Alho et al, 2014). In particular. Hall and Plack (2009) pro- 
vided evidence that apart from lateral Heschl's gyrus (Schneider 



et al., 2005; Warren et al., 2005), the (right) PT supports pitch 
processing to a substantial degree. Importantly, Hall and Plack 
(2009) used stimuli that bore close resemblance to our degraded 
sound stimuli such that participants may have perceived and pro- 
cessed pitch differences between our sound categories. Altogether, 
the involvement of pSTG and PT in our experiment is likely 
to reflect spectral processing. The positive correlation of alpha 
power and BOLD activation in this "hub"-like structure for audi- 
tory categorization (Griffiths and Warren, 2002) can shed further 
light onto the relative weighting of spectral vs. duration cues 
under degradation. 

Previous studies using simultaneous EEG-fMRI recordings 
have observed positive and negative correlations of alpha power 
with BOLD (Laufs et al, 2003; Gon^alves et al, 2006; de Munck 
et al, 2007; Goldman et al, 2009; Scheeringa et al, 2009, 2011; 
Michels et al, 2010; Liu et al., 2012). The interpretation of neg- 
ative correlations of alpha power with BOLD activations follows 
the functional inhibition hypothesis (Foxe et al, 1998; iQimesch 
et al., 2007; Foxe and Snyder, 2011; Weisz et al, 2011, 2013; 
Klimesch, 2012; Obleser and Weisz, 2012; Obleser et al., 2012). 
That is, regions where activations increase with decreasing alpha 
power have been suggested to be relevant for attending to infor- 
mative stimulus features, while regions where alpha power is 
positively correlated with BOLD haven been suggested to support 
the suppression of non-informative (task-irrelevant) stimulus 
features. Positive correlations of alpha power with BOLD can also 
be interpreted within the "active processing hypothesis" (Palva 
and Palva, 2011). This hypothesis relates enhanced alpha power 
to stronger neural coordination in cortical areas processing task- 
relevant information, particularly for higher-order attentional 
and executive functions. 

Here, we observed that the posterior STG and the PT 
showed increased activation for degraded vs. nondegraded stim- 
uli, and that STG and PT activations positively correlated 
with alpha power. This can either be interpreted with the 
"functional inhibition hypothesis" or the "active processing 
hypothesis:" 

According to the "functional inhibition hypothesis," the pos- 
itive correlation of alpha power with BOLD activation in (right) 
PT may reflect the relative inhibition of spectral information in 
this brain area. In detail, introduction of spectral degradation 
affected the informativeness of spectral peak for categorization, 
and corresponded to a change in cue utilization. That is, spec- 
tral peak became relatively task-irrelevant, and may have been 
inhibited in pSTG and PT. 

According to the "active processing hypothesis," the positive 
correlation of alpha power and BOLD activation in pSTG and PT 
(particularly under degradation) may reflect the enhanced need 
for neural coordination in order to maintain spectral cue utiliza- 
tion. Overall, cue indices remained negative even after spectral 
information was degraded, that is, participants still relied on their 
initial spectral categorization strategy. For maintenance of the 
spectral strategy, participants might have drawn on (right) pos- 
terior STG and PT resources. Thus, the positive correlation of 
alpha power and BOLD in these cortical regions may index the 
need to listen "harder" to degraded stimulus cues that once were 
informative. 
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FIGURE 4 I (A) Positive correlation of alpha power with BOLD activity in the 
degraded condition (red), and correlation differences between conditions 
(blue; co-activation: magenta). Betas extracted from orbital inferior frontal 
gyrus and PT ROI visualize correlation differences between conditions. Data 
taken from a representative participant illustrate the positive single-subject 



alpha power/BOLD correlation in the degraded condition. (B) Positive 
correlations of cue index and BOLD in the degraded condition (red) and 
correlation differences between conditions (blue). Betas extracted from IFG, 
dorso-lateral prefrontal cortex and I PL visualize the correlation differences 
between conditions. 



Finally, the "active processing hypothesis" seems to receive fur- 
ther support from the positive alpha power/BOLD correlations in 
frontal (IFG) areas. Note that Palva and Palva (20 11) suggest that 
inhibition at lower sensory levels might be achieved by higher- 
level frontal functions, such that a positive alpha power/BOLD 
correlation in IFG may indicate that lesser reliance on spectral 
than on duration cues under degradation is mediated by activ- 
ity in frontal regions. This may also relate to the observation 
that alpha power and behavioral cue utilization indices correlated 
only at trend-level with each other, suggesting that alpha power 
changes are more likely reflecting indirect, modulatory signa- 
tures of "functional inhibition" (after a stimulus while preparing 
a response, see also Obleser and Weisz, 2012; Wilsch et al., 2014). 
These signatures are dissociable from and follow in time early 
auditory signatures, accounting for the latency of the alpha power 
effect centered at around 500 ms post stimulus onset. 

A ROLE OF THE RIGHT IPL IN AUDITORY AHENTION 

The behavioral tendency of disregarding spectral cues in the 
degraded condition of our experiment was accompanied by 
increased activation in anterior prefrontal cortex, and, compared 
to the nondegraded condition, in right IPL. In the degraded con- 
dition, right IPL showed a stronger correlation of cue index with 
BOLD activation than in the nondegraded condition (Figure 4B). 
As part of the fronto-parietal executive network (Posner and 
Dehaene, 1994; Corbetta et al, 2000), the IPL has repeatedly 
been found to subserve selective attention (Shaywitz et al., 2001; 
Behrmann et al, 2004; Salmi et al., 2009) and attentional control 



(Hill and Miller, 2010). Its activation was commonly observed 
in situations that require flexible changes in attention during the 
processing of informative stimulus features or task-relevant infor- 
mation (Geng and Mangun, 2009; Schultz and Lennert, 2009; 
Gillebert et al, 2012). In line with studies supporting the IPL's 
role in selectively attending to the most informative stimulus fea- 
ture (lacquemot et al, 2003; Gaab et al., 2006; Husain et al, 
2006; Kiefer et al, 2008; Obleser et al, 2012), changes in IPL 
activation might support the change in cue utilization that was 
necessary for successful categorization (see Henry et al, 2013 
for attention to temporal features). Note however that, behav- 
iorally, participants tried to maintain their initial strategy and 
overall differed only marginally in cue utilization. Therefore, this 
interpretation must be considered carefully and substantiated by 
future research. 

SUMMARY 

In this multi-modal imaging study, we have shown that acous- 
tic cue utilization during auditory categorization is flexible, 
even though listeners seem resilient to abandon initial catego- 
rization strategies. Brain areas processing the specific acoustic 
information — spectral peak vs. duration — supported the change 
in cue preference together with areas in the fronto-parietal atten- 
tion network. Our data complement previous speech-related 
observations of alpha power increases in adverse and effortful lis- 
tening situations (Obleser and Weisz, 2012; Obleser et al, 2012; 
Wilsch et al, 2014). We suggest that increased alpha power under 
degradation mediates the relative weighting of acoustic stimulus 
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features. Both the "functional inhibition" and the "active pro- 
cessing" hypotheses can account for these findings. Importantly, 
the combination of behavioral, electrophysiological, and hemo- 
dynamic measures is an indispensable methodology for further 
investigations in auditory cognition. 
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